Using boosting to improve a hybrid HMM/neural network speech recognizer
نویسنده
چکیده
”Boosting” is a general method for improving the performance of almost any learning algorithm. A recently proposed and very promising boosting algorithm is AdaBoost [7]. In this paper we investigate if AdaBoost can be used to improve a hybrid HMM/ neural network continuous speech recognizer. Boosting significantly improves the word error rate from 6.3% to 5.3% on a test set of the OGI Numbers95 corpus, a medium size continuous numbers recognition task. These results compare favorably with other combining techniques using several different feature representations or additional information from longer time spans. Ensemble methods or committees of learning machines can often improve the performance of a system in comparison to a single learning machine. A recently proposed and very promising boosting algorithm is AdaBoost [7]. It constructs a composite classifier by sequentially training classifiers while more and more emphasis on certain patterns. Several authors have reported important improvements with respect to one classifier on several machine learning benchmark problems of the UCI repository, e.g. [2, 6]. These experiments displayed rather intriguing generalization properties, such as continued decrease in generalization error after training error reaches zero. However, most of these data bases are very small (only several hundreds of training examples) and contain no significant amount of noise. There is also recent evidence that AdaBoost may very well overfit if we combine several hundred thousands classifiers [8] and [5] reports severe performance degradations of AdaBoost when adding 20% noise on the class-labels. In summary, we can say that the reasons for the impressive success of AdaBoost are still not completely understood. To the best of our knowledge, an application of AdaBoost to a real world problem has not yet been reported in the literature either. In this paper we investigate if AdaBoost can be applied to boost the performance of a continuous speech recognition system. In this domain we have to deal with large amounts of data (often more than 1 million training examples) and inherently noisy phoneme labels. The paper is organized as follows. In the next two sections we summarize the AdaBoost algorithm and our baseline speech recognizer. In the third section we shown how AdaBoost can be applied to this task and we report results on the Numbers95 corpus and compare them with other classifier combination techniques. The paper finishes with a conclusion and perspectives for future work. new address: LIMSI-CNRS, bat 508, BP 133, 91403 Orsay cedex, FRANCE, email: [email protected] 1. ADABOOST AdaBoost, constructs a composite classifier by sequentially training classifiers while putting more and more emphasis on certain patterns. For this, AdaBoost maintains a probability distribution Dt(i) over the original training set. In each round t the classifier is trained with respect to this distribution. Some learning algorithms don’t allow training with respect to a weighted cost function. In this case sampling with replacement (using the distribution Dt) can be used to approximate a weighted cost function. Examples with high probability would then occur more often than those with low probability, while some examples may not occur in the sample at all although their probability is not zero. Previous experiments have shown that best results in terms of training time and generalization error can be obtained when resampling a new training set from the original training set after each epoch [10]. After each round, the probability of incorrectly labeled examples is increased and the probability of correctly labeled examples is decreased. The result of training the t classifier is a hypothesis ht : X!Y where Y = f1; :::; kg is the space of labels, and X is the space of input features. After the t round the weighted error t of the resulting classifier is calculated and the distribution Dt+1 is computed from Dt, by increasing the probability of incorrectly labeled examples. The probabilities are changed so that the error of the t classifier using these new “weights” Dt+1 on the errors would be 0.5. In this way the classifiers are optimally decoupled. The global decision f is obtained by weighted voting. Figure 2 left summarizes the basic AdaBoost algorithm. In general, neural network classifiers provide more information than just a class label: it can be shown that the network outputs approximate the a-posteriori probabilities of classes, and it should be reasonable to use this information rather than performing a hard decision for one recognized class. This issue is addressed by another version of AdaBoost, called AdaBoost.M2 [7]. It can be used when the classifier computes confidence scores for each class. The result of training the t classifier is now a hypothesis ht : X Y ! [0; 1]. Furthermore we use a distribution over the set of all miss-labels: B = f(i; y) : i 2f1; :::; Ng; y 6= yig, where N is the number of training examples. Therefore jBj = N(k 1). AdaBoost modifies this distribution so that the next learner focuses not only on the examples that are hard to classify, but more specifically on the incorrect labels against which it is hardest to discriminate. Note that the miss-label distribution Dt induces a distribution over the examples: Pt(i) = W t i = P
منابع مشابه
A new hybrid structure of speech recognizer based on HMM and neural network
In this paper, we introduced a new framework of speech recognizer based on HMM and neural net. Unlike the traditional hybrid system, the neural net was used as a post processor, which classify the speech data segmented by HMM recognizer. The purpose of this method is to improve the top-choice accuracy of HMM based speech recognition system in our lab. Major issues such as how to use the segment...
متن کاملImproved Hidden Markov Model Speech Recognition Using Radial Basis Function Networks
A high performance speaker-independent isolated-word hybrid speech recognizer was developed which combines Hidden Markov Models (HMMs) and Radial Basis Function (RBF) neural networks. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer upon which the hybrid syste...
متن کاملImproved Hidden Markov Models Speech Recognition Using Radial Basis Function Networks
A high performance speaker-independent isolated-word hybrid speech recognizer was developed which combines Hidden Markov Models (HMMs) and Radial Basis Function (RBF) neural networks. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer upon which the hybrid syste...
متن کاملDevelopment of a French speech recognizer using a hybrid HMM/MLP system
In this paper we describe the development of a French speech recognizer, and the experiments we carried out on our hybrid HMM/ANN system which combines Arti cial Neural Networks (ANN) and Hidden Markov Models (HMMs). A phone recognition experiment with our baseline system achieved a phone accuracy of about 75% which is very similar to the best results reported in the literature [1]. Preliminary...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملSpeaker adaptation using regularization and network adaptation for hybrid MMI-NN/HMM speech recognition
This paper describes, how to perform speaker adaptation for a hybrid large vocabulary speech recognition system. The hybrid system is based on a Maximum Mutual Information Neural Network (MMINN), which is used as a Vector Quantizer (VQ) for a discrete HMM speech recognizer. The combination of MMINNs and HMMs has shown good performance on several large vocabulary speech recognition tasks like RM...
متن کامل